A Framework for Inferring Causality from Multi-Relational Observational Data using Conditional Independence

نویسندگان

  • Sudeepa Roy
  • Babak Salimi
چکیده

The study of causality or causal inference – how much a given treatment causally affects a given outcome in a population – goes way beyond correlation or association analysis of variables, and is critical in making sound data driven decisions and policies in a multitude of applications. The gold standard in causal inference is performing controlled experiments, which often is not possible due to logistical or ethical reasons. As an alternative, inferring causality on observational data based on the Neyman-Rubin potential outcome model has been extensively used in statistics, economics, and social sciences over several decades. In this paper, we present a formal framework for sound causal analysis on observational datasets that are given as multiple relations and where the population under study is obtained by joining these base relations. We study a crucial condition for inferring causality from observational data, called the strong ignorability assumption (the treatment and outcome variables should be independent in the joined relation given the observed covariates), using known conditional independences that hold in the base relations. We also discuss how the structure of the conditional independences in base relations given as graphical models help infer new conditional independences in the joined relation. The proposed framework combines concepts from databases, statistics, and graphical models, and aims to initiate new research directions spanning these fields to facilitate powerful data-driven decisions in today’s big data world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-intuitive conditional independence facts hold in models of network data

Many social scientists and researchers across a wide range of fields focus on analyzing a single causal dependency or a conditional model of some outcome variable. However, to reason about interventions or conditional independence, it is useful to construct a joint model of a domain. Researchers in computer science, statistics, and philosophy have developed representations (e.g., Bayesian netwo...

متن کامل

Lifted Representation of Relational Causal Models Revisited: Implications for Reasoning and Structure Learning

Maier et al. (2010) introduced the relational causal model (RCM) for representing and inferring causal relationships in relational data. A lifted representation, called abstract ground graph (AGG), plays a central role in reasoning with and learning of RCM. The correctness of the algorithm proposed by Maier et al. (2013a) for learning RCM from data relies on the soundness and completeness of AG...

متن کامل

Reasoning about Independence in Probabilistic Models of Relational Data

Bayesian networks leverage conditional independence to compactly encode joint probability distributions. Many learning algorithms exploit the constraints implied by observed conditional independencies to learn the structure of Bayesian networks. The rules of d -separation provide a theoretical and algorithmic framework for deriving conditional independence facts from model structure. However, t...

متن کامل

Towards Conditional Independence Test for Relational Data

Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are independently and identically distributed (i.i.d.). However, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generalization of CI to the relational s...

متن کامل

A Kernel Conditional Independence Test for Relational Data

Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are independently and identically distributed (i.i.d.). However, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generalization of CI to the relational s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1708.02536  شماره 

صفحات  -

تاریخ انتشار 2017